ACF: The autocorrelation function (ACF) defines how data points in a time series are related, on average, to the preceding data points (Box, Jenkins, & Reinsel, 1994). In other words, it measures the self-similarity of the signal over different delay times. Accordingly, the ACF is a function of the delay or lag Ï„, which determines the time shift taken into the past to estimate the similarity between data points. https://en.wikipedia.org/wiki/Autocorrelation
PACF: The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags. https://en.wikipedia.org/wiki/Partial_autocorrelation_function
In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other.
Alt Text
The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Let y and x be stationary time series. To test the null hypothesis that x does not Granger-cause y, one first finds the proper lagged values of y to include in an univariate autoregression of y:
\[{\displaystyle y_{t}=a_{0}+a_{1}y_{t-1}+a_{2}y_{t-2}+\cdots +a_{m}y_{t-m}+{\text{error}}_{t}.}\] Next, the autoregression is augmented by including lagged values of x:
\[{\displaystyle y_{t}=a_{0}+a_{1}y_{t-1}+a_{2}y_{t-2}+\cdots +a_{m}y_{t-m}+b_{p}x_{t-p}+\cdots +b_{q}x_{t-q}+{\text{error}}_{t}.}\]
One retains in this regression all lagged values of x that are individually significant according to their t-statistics, provided that collectively they add explanatory power to the regression according to an F-test (whose null hypothesis is no explanatory power jointly added by the x’s). In the notation of the above augmented regression, p is the shortest, and q is the longest, lag length for which the lagged value of x is significant.
The null hypothesis that x does not Granger-cause y is accepted if and only if no lagged values of x are retained in the regression.
https://en.wikipedia.org/wiki/Granger_causality
#Update (May 1) Consistent date: March 29 ~ Apr 24 # of complete sensors: 11
Rule: Find and replace the indoor air if it exceeds the max ever observed of outdoor air in the time window (6 hours).
## [1] "outlier %: 1003/3882 = 25.84%"
Concern 1: outlier % is high
Compare before and after outlier removal
## [1] "For cleaned data, the best lag: 27 (4.5 hours)"
Granger causality test results:
For raw data:
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
## Res.Df Df F Pr(>F)
## 1 3875
## 2 3877 -2 6.5785 0.001406 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For clean data:
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:27) + Lags(AQI_o, 1:27)
## Model 2: AQI_i ~ Lags(AQI_i, 1:27)
## Res.Df Df F Pr(>F)
## 1 3800
## 2 3827 -27 9.0202 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For clean data, design a Placebo Test
Model 0: AQI_i ~ Lags(AQI_i, 1:27)
experimental group (4.5 hours outdoor AQI): Model 1: AQI_i ~ Lags(AQI_i, 1:27) + Lags(AQI_o, 1:27)
Placebo group (4.5 + \(t\) hours outdoor AQI): Model 2: AQI_i ~ Lags(AQI_i, 1:27) + Lags(AQI_o, 1+t:27+t)
Here, \(t\) can be 24*6 (1 day), 48*6 (2 days), …, 24*7*6 (1 week). This means use 4.5 hours \(t\) period ago as one of the input to predict the indoor AQI, which does not make sense physically. (high P-value is expected to be seen)
make hypothesis test between Model 0 and Model 1; make hypothesis test between Model 0 and Model 2; Compare the degree of significance (P-value)
Conclusion: P-values of experimental group are lower than placebo group as expected.
## # A tibble: 11 x 8
## participant `Valid records` `outlier %` `time_lags (mins)` `time_lags (hours~
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 126853 3882 25.8 270 4.5
## 2 125627 3860 65.7 80 1.33
## 3 126603 3808 76.3 200 3.33
## 4 127177 3888 40.7 10 0.167
## 5 127183 3829 53.3 130 2.17
## 6 127187 3874 43.6 10 0.167
## 7 127213 3888 38.3 310 5.17
## 8 127221 3857 30.0 30 0.5
## 9 127227 3887 56.5 10 0.167
## 10 127305 2327 30.6 30 0.5
## 11 127303 3882 54.1 20 0.333
## # ... with 3 more variables: P <dbl>, score <dbl>, Resistance <dbl>
\[Score\ =\ -log(P\_value)\] \[Resistance\ =\ \frac{Score\ -min(Score)}{max(Score)\ -\ min(Score)}\]
## [1] "outlier %: 2536/3860 = 65.7%"
## [1] "For cleaned data, the best lag: 8 (1.33 hours)"
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
## Res.Df Df F Pr(>F)
## 1 3853
## 2 3855 -2 0.3296 0.7192
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:8) + Lags(AQI_o, 1:8)
## Model 2: AQI_i ~ Lags(AQI_i, 1:8)
## Res.Df Df F Pr(>F)
## 1 3835
## 2 3843 -8 2.6217 0.007327 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 0.007327019
## [1] "outlier %: 2906/3808 = 76.31%"
## [1] "For cleaned data, the best lag: 20 (3.33 hours)"
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:1) + Lags(AQI_o, 1:1)
## Model 2: AQI_i ~ Lags(AQI_i, 1:1)
## Res.Df Df F Pr(>F)
## 1 3804
## 2 3805 -1 6.7565 0.009377 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:20) + Lags(AQI_o, 1:20)
## Model 2: AQI_i ~ Lags(AQI_i, 1:20)
## Res.Df Df F Pr(>F)
## 1 3747
## 2 3767 -20 110 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 0
## [1] "outlier %: 1583/3888 = 40.72%"
## [1] "For cleaned data, the best lag: 1 (0.17 hours)"
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
## Res.Df Df F Pr(>F)
## 1 3881
## 2 3883 -2 22.113 2.822e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:1) + Lags(AQI_o, 1:1)
## Model 2: AQI_i ~ Lags(AQI_i, 1:1)
## Res.Df Df F Pr(>F)
## 1 3884
## 2 3885 -1 102.94 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 6.826752e-24
## [1] "outlier %: 2042/3829 = 53.33%"
## [1] "For cleaned data, the best lag: 13 (2.17 hours)"
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
## Res.Df Df F Pr(>F)
## 1 3822
## 2 3824 -2 1.3506 0.2592
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:13) + Lags(AQI_o, 1:13)
## Model 2: AQI_i ~ Lags(AQI_i, 1:13)
## Res.Df Df F Pr(>F)
## 1 3789
## 2 3802 -13 10.191 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 1.737408e-21
## [1] "outlier %: 1688/3874 = 43.57%"
## [1] "For cleaned data, the best lag: 1 (0.17 hours)"
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
## Res.Df Df F Pr(>F)
## 1 3867
## 2 3869 -2 18.709 8.197e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:1) + Lags(AQI_o, 1:1)
## Model 2: AQI_i ~ Lags(AQI_i, 1:1)
## Res.Df Df F Pr(>F)
## 1 3870
## 2 3871 -1 94.896 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 3.599921e-22
## [1] "outlier %: 1488/3888 = 38.27%"
## [1] "For cleaned data, the best lag: 31 (5.17 hours)"
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:21) + Lags(AQI_o, 1:21)
## Model 2: AQI_i ~ Lags(AQI_i, 1:21)
## Res.Df Df F Pr(>F)
## 1 3824
## 2 3845 -21 2.4063 0.0003285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:31) + Lags(AQI_o, 1:31)
## Model 2: AQI_i ~ Lags(AQI_i, 1:31)
## Res.Df Df F Pr(>F)
## 1 3794
## 2 3825 -31 7.7704 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 4.226207e-33
## [1] "outlier %: 1159/3857 = 30.05%"
## [1] "For cleaned data, the best lag: 3 (0.5 hours)"
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
## Res.Df Df F Pr(>F)
## 1 3850
## 2 3852 -2 10.199 3.821e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:3) + Lags(AQI_o, 1:3)
## Model 2: AQI_i ~ Lags(AQI_i, 1:3)
## Res.Df Df F Pr(>F)
## 1 3847
## 2 3850 -3 8.2467 1.815e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 1.815044e-05
## [1] "outlier %: 2197/3887 = 56.52%"
## [1] "For cleaned data, the best lag: 1 (0.17 hours)"
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:1) + Lags(AQI_o, 1:1)
## Model 2: AQI_i ~ Lags(AQI_i, 1:1)
## Res.Df Df F Pr(>F)
## 1 3883
## 2 3884 -1 47.224 7.346e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:1) + Lags(AQI_o, 1:1)
## Model 2: AQI_i ~ Lags(AQI_i, 1:1)
## Res.Df Df F Pr(>F)
## 1 3883
## 2 3884 -1 130.33 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 1.0281e-29
## [1] "outlier %: 713/2327 = 30.64%"
## [1] "For cleaned data, the best lag: 3 (0.5 hours)"
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
## Res.Df Df F Pr(>F)
## 1 2320
## 2 2322 -2 5.465 0.004287 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:3) + Lags(AQI_o, 1:3)
## Model 2: AQI_i ~ Lags(AQI_i, 1:3)
## Res.Df Df F Pr(>F)
## 1 2317
## 2 2320 -3 19.319 2.266e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 2.266182e-12
## [1] "outlier %: 2101/3882 = 54.12%"
## [1] "For cleaned data, the best lag: 2 (0.33 hours)"
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
## Res.Df Df F Pr(>F)
## 1 3875
## 2 3877 -2 12.097 5.792e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
## Res.Df Df F Pr(>F)
## 1 3875
## 2 3877 -2 36.897 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 1.338759e-16
## [1] "outlier %: 1003/3882 = 25.84%"
## [1] "For cleaned data, the best lag: 27 (4.5 hours)"
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
## Res.Df Df F Pr(>F)
## 1 3875
## 2 3877 -2 6.5785 0.001406 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
##
## Model 1: AQI_i ~ Lags(AQI_i, 1:27) + Lags(AQI_o, 1:27)
## Model 2: AQI_i ~ Lags(AQI_i, 1:27)
## Res.Df Df F Pr(>F)
## 1 3800
## 2 3827 -27 9.0202 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 2.0204e-35